37 research outputs found

    Temporal Logic Monitoring Rewards via Transducers

    Get PDF
    In Markov Decision Processes (MDPs), rewards are assigned according to a function of the last state and action. This is often limiting, when the considered domain is not naturally Markovian, but becomes so after careful engineering of extended state space. The extended states record information from the past that is sufficient to assign rewards by looking just at the last state and action. Non-Markovian Reward Decision Processes (NRMDPs) extend MDPs by allowing for non-Markovian rewards, which depend on the history of states and actions. Non-Markovian rewards can be specified in temporal logics on finite traces such as LTLf/LDLf, with the great advantage of a higher abstraction and succinctness; they can then be automatically compiled into an MDP with an extended state space. We contribute to the techniques to handle temporal rewards and to the solutions to engineer them. We first present an approach to compiling temporal rewards which merges the formula automata into a single transducer, sometimes saving up to an exponential number of states. We then define monitoring rewards, which add a further level of abstraction to temporal rewards by adopting the four-valued conditions of runtime monitoring; we argue that our compilation technique allows for an efficient handling of monitoring rewards. Finally, we discuss application to reinforcement learning

    Exploiting Multiple Abstractions in Episodic RL via Reward Shaping

    Full text link
    One major limitation to the applicability of Reinforcement Learning (RL) to many practical domains is the large number of samples required to learn an optimal policy. To address this problem and improve learning efficiency, we consider a linear hierarchy of abstraction layers of the Markov Decision Process (MDP) underlying the target domain. Each layer is an MDP representing a coarser model of the one immediately below in the hierarchy. In this work, we propose a novel form of Reward Shaping where the solution obtained at the abstract level is used to offer rewards to the more concrete MDP, in such a way that the abstract solution guides the learning in the more complex domain. In contrast with other works in Hierarchical RL, our technique has few requirements in the design of the abstract models and it is also tolerant to modeling errors, thus making the proposed approach practical. We formally analyze the relationship between the abstract models and the exploration heuristic induced in the lower-level domain. Moreover, we prove that the method guarantees optimal convergence and we demonstrate its effectiveness experimentally.Comment: This is an extended version of the paper presented at AAAI 2023, https://doi.org/10.1609/aaai.v37i6.2588

    A PoW-less Bitcoin with Certified Byzantine Consensus

    Full text link
    Distributed Ledger Technologies (DLTs), when managed by a few trusted validators, require most but not all of the machinery available in public DLTs. In this work, we explore one possible way to profit from this state of affairs. We devise a combination of a modified Practical Byzantine Fault Tolerant (PBFT) protocol and a revised Flexible Round-Optimized Schnorr Threshold Signatures (FROST) scheme, and then we inject the resulting proof-of-authority consensus algorithm into Bitcoin (chosen for the reliability, openness, and liveliness it brings in), replacing its PoW machinery. The combined protocol may operate as a modern, safe foundation for digital payment systems and Central Bank Digital Currencies (CBDC)

    Standard Grammars for LTL and LDL

    No full text
    The heterogeneity of tools that support temporal logic formulae poses several challenges in terms of interoperability. This document proposes standard grammars for Linear Temporal Logic (LTL) (Pnueli 1977) and Linear Dynamic Logic (Vardi 2011; De Giacomo and Vardi 2013)

    Compositional Approach to Translate LTLf/LDLf into Deterministic Finite Automata

    No full text
    The translation from temporal logics to automata is the workhorse algorithm of several techniques in computer science and AI, such as reactive synthesis, reasoning about actions, FOND planning with temporal specifications, and reinforcement learning with non-Markovian rewards, just to name a few. Unfortunately, the problem is computationally intractable, requiring the implementation of several heuris- tics to make it usable in practice. In this paper, following the recent interest in temporal logic formalisms over finite traces, we present a compositional approach for dealing with translations of Linear Temporal Logic and Linear Dynamic Logic (LDLf) on finite traces into Deterministic Finite Automata DFA.That is, we inductively transform each LTLf/LDLf subformula into a DFA, and combine them through automata operators. By relying on efficient semi-symbolic automata rep- resentations, we empirically show the effectiveness of our ap- proach and the competitiveness with similar tools. Moreover, this is the first work that provides a scalable and practical tool supporting the translation to DFA not only for LTLf but also for full LDLf

    Planning for Temporally Extended Goals in Pure-Past Linear Temporal Logic: A Polynomial Reduction to Standard Planning

    No full text
    We study temporally extended goals expressed in Pure-Past LTL (PPLTL). PPLTL is particularly interesting for expressing goals since it allows to express sophisticated tasks as in the Formal Methods literature, while the worst-case computational complexity of Planning in both deterministic and nondeterministic domains (FOND) remains the same as for classical reachability goals. However, while the theory of planning for PPLTL goals is well understood, practical tools have not been specifically investigated. In this paper, we make a significant leap forward in the construction of actual tools to handle PPLTL goals. We devise a technique to polynomially translate planning for PPLTL goals into standard planning. We show the formal correctness of the translation, its complexity, and its practical effectiveness through some comparative experiments. As a result, our translation enables state-of-the-art tools, such as FD or MyND, to handle PPLTL goals seamlessly, maintaining the impressive performances they have for classical reachability goals

    A Practical Framework for General Dialogue-Based Bilateral Interactions

    No full text
    For autonomous agents and services to cooperate and interact in multi-agent environments they require well-defined protocols. A multitude of protocol languages for multi-agent systems have been proposed in the past, but they have mostly remained theoretical or have limited prototypical implementations. This work proposes a practical realisation of a general framework for defining dialogue-based bilateral interaction protocols which supports arbitrary agent-based interactions. Crucially, this work is tightly integrated with a modern framework for the creation of autonomous agents and multi-agent systems, making it possible to go from protocols’ specification to their implementation and usage by agents, and enables evaluation of protocols’ effectiveness and applicability in real-world use cases

    Restraining Bolts for Reinforcement Learning Agents

    No full text
    In this work, we have investigated the concept of “restraining bolt”, inspired by Science Fiction. We have two distinct sets of features extracted from the world, one by the agent and one by the authority imposing some restraining specifications on the behaviour of the agent (the “restraining bolt”). The two sets of features and, hence the model of the world attainable from them, are apparently unrelated since of interest to independent parties. However, they both account for (aspects of) the same world. We have considered the case in which the agent is a reinforcement learning agent on a set of low-level (subsymbolic) features, while the restraining bolt is specified logically using linear time logic on finite traces LTLf/LDLf over a set of high-level symbolic features. We show formally, and illustrate with examples, that, under general circumstances, the agent can learn while shaping its goals to suitably conform (as much as possible) to the restraining bolt specifications.
    corecore